This paper expounds the design and control of a new Variable Stiffness Series Elastic Actuator (VSSEA). It is established by employing a modular mechanical design approach that allows us to effectively optimise the stiffness modulation characteristics and power density of the actuator. The proposed VSSEA possesses the following features: i) no limitation in the work-range of output link, ii) a wide range of stiffness modulation (~20Nm/rad to ~1KNm/rad), iii) low-energy-cost stiffness modulation at equilibrium and non-equilibrium positions, iv) compact design and high torque density (~36Nm/kg), and v) high-speed stiffness modulation (~3000Nm/rad/s). Such features can help boost the safety and performance of many advanced robotic systems, e.g., a cobot that physically interacts with unstructured environments and an exoskeleton that provides physical assistance to human users. These features can also enable us to utilise variable stiffness property to attain various regulation and trajectory tracking control tasks only by employing conventional controllers, eliminating the need for synthesising complex motion control systems in compliant actuation. To this end, it is experimentally demonstrated that the proposed VSSEA is capable of precisely tracking desired position and force control references through the use of conventional Proportional-Integral-Derivative (PID) controllers.
translated by 谷歌翻译
Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first fast and widely-applicable pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall's key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. TargetCall filters out all off-target reads before basecalling; and the highly-accurate but slow basecalling is performed only on the raw signals whose noisy reads are labeled as on-target. Our thorough experimental evaluations using both real and simulated data show that TargetCall 1) improves the end-to-end basecalling performance of the state-of-the-art basecaller by 3.31x while maintaining high (98.88%) sensitivity in keeping on-target reads, 2) maintains high accuracy in downstream analysis, 3) precisely filters out up to 94.71% of off-target reads, and 4) achieves better performance, sensitivity, and generality compared to prior works. We freely open-source TargetCall at https://github.com/CMU-SAFARI/TargetCall.
translated by 谷歌翻译
In this paper we look into the conjecture of Entezari et al. (2021) which states that if the permutation invariance of neural networks is taken into account, then there is likely no loss barrier to the linear interpolation between SGD solutions. First, we observe that neuron alignment methods alone are insufficient to establish low-barrier linear connectivity between SGD solutions due to a phenomenon we call variance collapse: interpolated deep networks suffer a collapse in the variance of their activations, causing poor performance. Next, we propose REPAIR (REnormalizing Permuted Activations for Interpolation Repair) which mitigates variance collapse by rescaling the preactivations of such interpolated networks. We explore the interaction between our method and the choice of normalization layer, network width, and depth, and demonstrate that using REPAIR on top of neuron alignment methods leads to 60%-100% relative barrier reduction across a wide variety of architecture families and tasks. In particular, we report a 74% barrier reduction for ResNet50 on ImageNet and 90% barrier reduction for ResNet18 on CIFAR10.
translated by 谷歌翻译
Resistive Random-Access Memory (RRAM) is well-suited to accelerate neural network (NN) workloads as RRAM-based Processing-in-Memory (PIM) architectures natively support highly-parallel multiply-accumulate (MAC) operations that form the backbone of most NN workloads. Unfortunately, NN workloads such as transformers require support for non-MAC operations (e.g., softmax) that RRAM cannot provide natively. Consequently, state-of-the-art works either integrate additional digital logic circuits to support the non-MAC operations or offload the non-MAC operations to CPU/GPU, resulting in significant performance and energy efficiency overheads due to data movement. In this work, we propose NEON, a novel compiler optimization to enable the end-to-end execution of the NN workload in RRAM. The key idea of NEON is to transform each non-MAC operation into a lightweight yet highly-accurate neural network. Utilizing neural networks to approximate the non-MAC operations provides two advantages: 1) We can exploit the key strength of RRAM, i.e., highly-parallel MAC operation, to flexibly and efficiently execute non-MAC operations in memory. 2) We can simplify RRAM's microarchitecture by eliminating the additional digital logic circuits while reducing the data movement overheads. Acceleration of the non-MAC operations in memory enables NEON to achieve a 2.28x speedup compared to an idealized digital logic-based RRAM. We analyze the trade-offs associated with the transformation and demonstrate feasible use cases for NEON across different substrates.
translated by 谷歌翻译
神经网络(NNS)的重要性和复杂性正在增长。神经网络的性能(和能源效率)可以通过计算或内存资源约束。在内存阵列附近或内部放置计算的内存处理(PIM)范式是加速内存绑定的NNS的可行解决方案。但是,PIM体系结构的形式各不相同,其中不同的PIM方法导致不同的权衡。我们的目标是分析基于NN的性能和能源效率的基于DRAM的PIM架构。为此,我们分析了三个最先进的PIM架构:(1)UPMEM,将处理器和DRAM阵列集成到一个2D芯片中; (2)Mensa,是针对边缘设备量身定制的基于3D堆栈的PIM架构; (3)Simdram,它使用DRAM的模拟原理来执行位序列操作。我们的分析表明,PIM极大地受益于内存的NNS:(1)UPMEM在GPU需要内存过度按要求的通用矩阵 - 矢量乘数内核时提供23x高端GPU的性能; (2)Mensa在Google Edge TPU上提高了3.0倍和3.1倍的能源效率和吞吐量,用于24个Google Edge NN型号; (3)SIMDRAM在三个二进制NNS中以16.7倍/1.4倍的速度优于CPU/GPU。我们得出的结论是,由于固有的建筑设计选择,NN模型的理想PIM体系结构取决于模型的独特属性。
translated by 谷歌翻译
在本文中,研究了无线网络的联合学习(FL)。在每个通信回合中,选择一部分设备以有限的时间和能量参与聚合。为了最大程度地减少收敛时间,在基于Stackelberg游戏的框架中共同考虑了全球损失和延迟。具体而言,在Leader级别上,将基于信息的设备选择(AOI)选择为全球损失最小化问题,而子渠道分配,计算资源分配和功率分配在追随者级别被视为延迟最小化问题。通过将追随者级别的问题分为两个子问题,追随者的最佳响应是通过基于单调优化的资源分配算法和基于匹配的子渠道分配算法获得的。通过得出收敛速率的上限,重新制定了领导者级别的问题,然后提出了基于列表的设备选择算法来实现Stackelberg平衡。仿真结果表明,所提出的设备选择方案在全球损失方面优于其他方案,而开发的算法可以显着降低计算和通信的时间消耗。
translated by 谷歌翻译
一种被称为优先体验重播(PER)的广泛研究的深钢筋学习(RL)技术使代理可以从与其时间差异(TD)误差成正比的过渡中学习。尽管已经表明,PER是离散作用域中深度RL方法总体性能的最关键组成部分之一,但许多经验研究表明,在连续控制中,它的表现非常低于参与者 - 批评算法。从理论上讲,我们表明,无法有效地通过具有较大TD错误的过渡对演员网络进行训练。结果,在Q网络下计算的近似策略梯度与在最佳Q功能下计算的实际梯度不同。在此激励的基础上,我们引入了一种新颖的经验重播抽样框架,用于演员批评方法,该框架还认为稳定性和最新发现的问题是Per的经验表现不佳。引入的算法提出了对演员和评论家网络的有效和高效培训的改进的新分支。一系列广泛的实验验证了我们的理论主张,并证明了引入的方法显着优于竞争方法,并获得了与标准的非政策参与者 - 批评算法相比,获得最先进的结果。
translated by 谷歌翻译
长期负载请求继续限制高性能处理器的性能。为了提高处理器的潜伏能力,建筑师主要依赖两种关键技术:复杂的数据预脱水和较大的芯片固定缓存。在这项工作中,我们表明:1)即使是先进的先进预摘要,也只能预测一半的外芯片负载请求,平均在广泛的工作负载中,而2)由于尺寸的增加,并且片上缓存的复杂性,花片载荷请求的延迟的很大一部分用于访问片上缓存层次结构。这项工作的目的是通过从其关键路径上删除片上缓存访问延迟来加速片外负载请求。为此,我们提出了一种称为爱马仕(Hermes)的新技术,其关键想法是:1)准确预测哪些负载请求可能会偏离芯片,2)猜测预测的芯片外载荷直接从主芯片负载所需的数据内存,同时也同时访问此类负载的高速缓存层次结构。为了启用爱马仕,我们开发了一种新的轻巧,基于智障的外芯片加载预测技术,该技术学会使用多个程序功能(例如,程序计数器的序列)来识别芯片外负载请求。对于每个负载请求,预测器都会观察一组程序功能,以预测负载是否会外芯片。如果预计负载将放置芯片,Hermes一旦生成负载的物理地址,就会直接向内存控制器发出投机请求。如果预测是正确的,则负载最终会错过缓存层次结构,并等待正在进行的投机请求完成,从而将芯片上缓存层次结构访问延迟隐藏在离芯片外负载的关键路径中。我们的评估表明,爱马仕显着提高了最先进的基线的性能。我们开源爱马仕。
translated by 谷歌翻译
基于机器学习的模型最近获得了吸引力,作为通过构建提供快速准确的性能预测的模型来克服FPGA下游实现过程的一种方式。但是,这些模型有两个主要局限性:(1)培训需要大量数据(从FPGA合成和实施报告中提取的功能),这是由于耗时的FPGA设计周期而具有成本范围的; (2)针对特定环境训练的模型无法预测新的未知环境。在云系统中,访问平台通常是昂贵的,ML模型的数据收集可以显着增加系统的总成本所有权(TCO)。为了克服这些限制,我们提出了Leaper,这是一种基于FPGA的基于转移学习的方法,可将现有的基于ML的模型适应新的,未知的环境,以提供快速准确的性能和资源利用预测。实验结果表明,当我们使用转移的模型进行5次学习的云环境中的预测并将设计空间探索时间从天数到几个小时,我们的方法平均提供了85%的精度。
translated by 谷歌翻译
无人机可以提供最小约束的适应摄像头视图,以支持机器人远程启用。此外,可以自动化无人机视图,以减轻远程运行期间操作员的负担。但是,现有方法并不关注使用无人机作为自动视图提供商的两个重要方面。首先是无人机应如何从工作空间内的一系列质量视点(例如对象的相对侧)中进行选择。第二是如何补偿不可避免的无人机姿势不确定性。在本文中,我们提供了一种非线性优化方法,该方法可通过铰接的操纵器产生有效和适应性的无人机观点,用于远程注射。我们的第一个关键想法是使用稀疏的人类输入输入来在多个自动生成的无人机观点之间切换。我们的第二个关键思想是引入优化目标,以在考虑无人机不确定性以及对观点遮挡和环境碰撞的影响的同时,保持对操纵器的视图。我们在无人机操纵器远程遥控系统中提供了无人机观点方法的实例化。最后,我们在完成普通家庭和工业操作的任务中对方法进行了初步验证。
translated by 谷歌翻译